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ABSTRACT 

This study describes a computerized item-selection 
program called PAIN that uses a pattern-analysis approach 
to select a most-valid subset of items from a set. The 
results of this study indicate that PAIN is capable of 
selecting a small subset of items which, when scored by 
pattern analysis, has greater validity than the original 
set. It appears that, as well as reducing the sizes of 
standard tests without losing predictive value, PAIN may 
also be of value in selecting biographical items of infor- 
mation for use as predictors. 
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I. INTRODUCTION 



Testing has played, and in all likelihood will continue 
to play, a major role in the classification and selection 
processes of both industry and the military. The vast a- 
mounts of time and money expended in this area warrant 
investigation of any methods which might increase the effi- 
ciency of the techniques involved or improve the end 
results. 

It was the objective of this study to investigate one 
such method. 



II. NATURE OF THE PROBLEM 



Testing has played an important role in the military 
system of classification and placement for many years, 

Basic schooling assignments and eligibility for promotion 
are just two of the more important areas that have been 
greatly influenced by testing and test interpretation. Yet, 
on many occasions the critical time element involved in 
testing and the lack of quality information available from 
tests have combined to make classification and placement a 
haphazard affair. 

The U,S, Navy has taken steps to reduce the magnitude 
of the testing problem by the development of a computer 
program called SEQUIN ( SEQU ential Nominator), As a 

result of the use of SEQUIN it has been shown not only that 
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the size of a test may be reduced without loss of validity, 
but also that validity may actually be increased by using a 
specially selected subset of questions from the original 
test. In some cases as few as seven items from a test were 
found to provide information equal to or better than that 
provided by the complete test. This being the case, it ap- 
peared that pattern analysis of a few selected items from a 
test might be feasible. 

The problem of using pattern analysis on a test even as 
small as 30 items is one of shear size. A test of 30-item 
size yields over a billion possible patterns. The evalua- 
tion of this number of patterns is a formidable job for 
even a computer, not to mention the problem involved with 
interpretation of individual results once all the patterns 
have been evaluated. In fact, in order to establish a pre- 
dictor value for each pattern that could be encountered, at 
least a billion subjects would have had to already have 
taken the test under consideration. 

Reducing the size of a test to seven items means that 
only 128 patterns have to be analyzed. The number of pat- 
terns involved is found by raising the number 2 to the 
power indicated by the number of items in the test, A sub- 
set of seven-item size would thus be suitable for pattern 
analysis. 

The objective of this study was to devise and evaluate 
a method of selecting items from a test that would optimize 
the validity of the subset selected when scored by pattern 
analysis. 
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III. DEVELOPMENT OF A SOLUTION 



A. SUBJECTS OF THE STUDY 

The records of approximately 2,400 U.S. Navy enlisted 
men who had attended the Electronics Technician School at 
San Diego, California, after taking the Electronics Techni- 
cian Selection Test (ETST) were used as the source data of 
this study. The validation sample consisted of the first 
1,500 subjects in the records who had completed the course 
of instruction and been assigned a final school grade. The 
cross-validation sample v/as composed of the next 750 sub- 
jects who met the completion and final-grade assignment 
requirements • 

The ETST is made up of three parts totaling 70 items. 
Part I consists of 20 items designed to test the subject in 
the area of mathematics. Part II is of 20-item length also 
and is related to science. Part III consists of items di- 
rected at testing knowledge in the area of electricity and 
radio and has 30 items in it. 

Each of the items on the ETST was treated as a predictor 
variable to be compared with the criterion of final school 
grade at the Electronics Technician School, 

The computer programs used in this study were written in 
the FORTRAN language and run on the IBM 3^0 computer at the 
U, S, Naval Postgraduate School, Monterey, California, The 
INTEGER*2 numbering convention was used where possible in 
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programing to conserve core storage area. The increased 
time involved in running the program with the use of this 
convention was not considered critical for this study, 

B, DATA CONVERSION 

The program to select items for pattern analysis was de- 
veloped on the premise that all items of the set being 
considered could be expressed in the form of a “yes-no" or 
“correct-incorrect" answer. This simplified the programing 
by allowing the item responses to be handled in a binary 
form. 

The conversion of the raw data was not suitable to a 
manual method of handling because over 168,000 responses 
required coding. The conversion was done by using the 
conversion program shown in the COMPUTER PROGRAMS sec- 
tion(p, 31). This program facilitated the handling of the 
lairge volume of information. Most of this program is unique 
to the situation imposed on the author by the form of the 
data available. However, the comments contained within this 
program provide a guideline to the steps required in con- 
verting data regardless of the nature of the data, 

C. PAIN 

The author desired to develop a computer program, which 
was to be called PAIN (Pattern ^alysis item Numinator), 
that would select a subset of items from the ETST. SEQUIN 

could already select a subset of items from the ETST but in 
a way different from that proposed by PAIN. PAIN was based 
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on the belief that the pattern of responses could contrib- 
ute more to the overall value of a predictor than was 
presently being obtained through the use of SEQUIN or any 
other method. To do this it was necessary for PAIN to be 
able to assign scores to each of the possible response pat- 
terns associated with a subset of items. The score assigned 
to a pattern in pattern analysis is the mean score of all 
subjects in a sample who have that pattern. Once a correla- 
tion coefficient was determined for a given subset, it 
would then be necessary to compare this coefficient with 
that obtained through the examination of every other subset 
of the same size available from the main set. This was im- 
practical for reasons which will be explained and an 
alternate approach was necessary if PAIN was to be used. 

The number of different subsets of N items that can be 
formed by a 70 -itera set is expressed as the combination of 
70 items taken N at a time. This meant that to investigate 
a subset as small as three items in size would have in- 
volved the examination of 5^# 7^0 possible subsets, each of 
which contained eight patterns of response. From the infor- 
mation available concerning SEQUIN it appeared that a 
subset of seven items would be necessary, at the least, if 
improvement was desired over the methods presently avail- 
able , 

A seven-item subset would allow the 1,500 subjects of 
the validation sample to be placed in the 128 response pat- 
terns involved with an average distribution of slightly 
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less than 12 subjects per pattern. This number was felt to 
be sufficient to establish a fairly stable mean score for 
each pattern, A second advantage of using seven items in 
the subset was that it would allow ready comparison with 
the work of Lieutenant K, Weinberg( personal communication). 
Lieutenant Weinberg had used the same raw data to investi- 
gate the validity of the seven items from the ETST selected 
as the best predictors by SEQUIN, Unless a reasonable al- 
ternative to the examination of all possible response 
patterns was taken, however, this would have meant the in- 
vestigation of over 77 trillion patterns, a job that was 
beyond even a computer approach. This was just for the se- 
lection of the seventh item of the subset! 

In order to overcome the problem of size, the assumption 
was made that once an item had been selected as the best 
for a subset of given size, it would continue to be a part 
of any larger subset. This allowed the author to say that 
the item selected as the best item for the subset of one 
item would be a part of the subset of two items, both of 
which would be part of the subset of three items, etc. This 
same approach is used in both stepwise regression and 
SEQUIN, and would reduce the selection process for the 
seventh item to an examination of slightly over 8,000 pat- 
terns c After PAIN was operative, tests were made to 
determine the effects of the item-retention assumption on 
the overall validity of the solution. 
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To test the item-retention assumption, subsets of two 
items each were selected randomly from each of the three 
parts of the ETST to act as the two-item subset in the PAIN 
program. The program was then allowed to select the items 
for the completion of the subsets of six-item size. The va- 
lidities of these subsets were then compared with the 
validity associated with the selection, by PAIN, of all the 
items in a subset. The two forced items were selected from 
individual parts of the ETST rather than the total ETST be- 
cause the results of the unrestricted selection by PAIN 
indicated that certain sections of the ETST were more valid 
than other sections. 

PAIN operated by computing mean criterion scores for each 
pattern of responses in a given subset, assigning these 
scores to subjects having that pattern of responses, and 
correlating assigned scores with the subjects' final school 
grades, PAIN provided the following information when runi 

1, Validities of all subsets examined, 

2, A list of the items that form the most valid subset 

A 

of a given size, 

3, The validity of the most valid subset of each size. 

The final form of PAIN is contained in the COMPUTER 

PROGRAMS section(p, , Representative run times and core 
storage areas for this program on the IBM 3^0 computer are 
contained in Table 2 (p, 22), Details on the roles of im- 
portant variables and how this program can be adapted for 
general use are contained in APPENDIX A, 
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D. CROSS-VALIDATION 

That program which the author calls "cross-validation" 
is in fact a combination of two separate programs. The 
first section of the cross-validation program was written 
to obtain mean scores for patterns of responses to items 
selected from the validation sample. This was done in the 
validation program but could not be output because it was 
not known while the program was running which subset would 
eventually be wanted. Since the score for each pattern of 
response changed whenever a new item was examined, it would 
have been necessary to store all scores for each pattern in 
the computer until the best item for inclusion in the sub- 
set was found, or to print out the pattern values of all 
subsets examined. On the other hand, the process of obtain- 
ing a mean score for each pattern was relatively easy and 
quick once all of the items in the subset were known. 

The second part of the cross-validation program did in 
fact perform cross-validation. The program assigned the 
mean pattern scores from the validation sample to subjects 
having the same response patterns in the cross-validation 
sample and then correlated these scores with the final 
school grades of the 750 subjects in this sample. 

The fact that all patterns may not have been assigned 
scores in the validation sample was handled by eliminating 
subjects from the cross-validation sample who had patterns 
that had not been assigned scores. This procedure was 
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considered acceptable because of the small number( eight ) of 
subjects who fell into this catagory for a seven-item sub- 
set. 

The cross-validation program provided the following in- 
formation, given the items that form the subset i 

1, A coded identification of the pattern of responses, 
APPENDIX B explains how to construct the patterns from 
the code, 

2, The mean score for each pattern encountered in the 
validation sample, 

3, An indication of which patterns of response were not 
encountered in the validation sample, 

4, The validity of the validation sample, 

5, The validity in cross-validation, 

6, The number of subjects eliminated from the cross- 
validation sample because their patterns were not scored 
in validation. 

The cross-validation program was also used to investi- 
gate what improvement in validity was obtainable through 
the use of PAIN over a random selection of a subset of , 
items to use in pattern analysis. 

The final form of the cross-validation program is con- 
tained in the COf/IPUTER PROGRAMS section(p, 3i ) • For a- 
seven-item subset this program had a running time of ap- 
proximately ten seconds on the IBM j60 computer and used a 
core storage area of approximately 80K, APPENDIX A explains 
in detail how this program can be adapted for general use. 
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IV. RESULTS 



PAIN selected items 20,1^,40, 56»7f5t and 33 in that or- 
der as the most predictive seven-item subset of the ETST. 
The validity of the seven items was ,828 in validation for 
1,500 subjects and ,778 for the cross-validation of 750 
subjects. 

By using the pattern-analysis technique to score the 
subset of items selected by PAIN, it was possible to exceed 
the validity that had previously been attached to the ETST 
as a predictor of final school grades. The Navy had deter- 
mined the predictive validity of the ETST to be 
approximately ,6l using the total number of all 70 items 
correct as the predictor, A subset of as few as three items 
selected by PAIN was capable of establishing a predictive 
validity of ,66 in cross-validation. 

The cross-validation results for PAIN-selected items 
was an improvement over the ,72 value of validity obtained 
in cross-validation by Lieutenant Weinberg in his study of 
sequin's seven best items for predicting final school 
grades , 

In the 15 cases investigated, the random selection of 
the first two items of the subset did not improve the va- 
lidity of any six-item subsets (See Table 3)« The items 
selected for the six-item subset under these conditions 
consistently included items selected by PAIN when no con- 
straints were placed on the selection process. Eight 
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subsets selected only one item each that had not appeared 
in the unconstrained solution. Item number 16 was the only 
”new” item to appear in a six-item subset more than once, 
and it appeared in six different subsets. 

Random selection of seven items for pattern analysis 
also consistently resulted in lower validities than those 
obtained through the use of PAIN (See Table 4). 

The author attempted using PAIN to select more than 
seven items from the ETST in order to determine at what 
point, if any, the validities in validation and/or 
cross-validation would level off or decline. At the 
ten-item-subset level, which was near the program size lim- 
it imposed by the computer system’s core storage capacity, 
the validity was still increasing for the validation 
sample. The cross-validation sample, on the other hand, did 
show a decline in predictive validity at the ten-item level 
(See Figure i, p, 1? and Table 1), 

Examination of the assigned scores associated with the 
128 patterns representing the best seven-item subset indi- 
cated that score assignments were not always directly 
related to the number of items correct in the subset. In 
some cases four correct items in the subset were assigned 
a lower score than three correct items. Also, the score as- 
signed to getting only a certain item correct in a subset 
of one size was not always the same as the score assigned 
to getting only that item correct in a different size 
subset. 
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FIGURE 1 



CORRELATION COEFFICIENTS 

ITEMS 



IN VALIDATION AND CROSS-VALIDATION 
OF 

SELECTED BY PAIN 



Correlation 

coefficient 




Note I See Table 1 for exact values of validity coefficients 
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V. CONCLUSIONS 



This study has in part confirmed the value of using 
pattern analysis as a predictive technique. While a method 
such as stepwise regression would have assigned a value to 
each of the items in the subset eventually selected by 
PAIN, the pattern-analysis approach allowed for the change 
in value of each of these items when used in different com- 
binations, It appears that PAIN obtained a higher validity 
than SEQUIN or stepwise-regression techniques (M ^112 
student projects in progress concurrent with this writing) 
because more complete use was made of the information a- 
vailable when a pattern-analysis approach to evaluation was 
used, 

PAIN and the scoring section of the cross-validation 
program have provided a predictor of Electronics Technician 
School final grades superior to any other known to the 
author at the time of this writing. By using this technique 
of item selection on other tests, a series of short, highly 
predictive tests for other areas requiring evaluation could 
be formed. A word of caution is appropriate though. The 
subjects of this study answered the questions used to form 
pain’s seven- item subset while taking the 70-item ETST, The 
effect on the results of taking a seven-item versus 70-item 
test was not known at the time of this writing. Until it is 
determined that the shortness of the test is without ad- 
verse effect, the full implementation of testing based on 
subsets cannot proceed. 
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The consistent appearance of certain items in subsets 
formed with and without constraints on PAIN, coupled with 
the lower validities resulting when PAIN was constrained, 
would seem to indicate that the process of retaining items 
previously selected does not reduce the overall validity of 
a subset. Although the solution obtained by this method may 
not be a true optimal solution, it is questionable how much 
can be gained by an attempt at examining all possible solu- 
tions. 

The author would have preferred to use a much larger 
sample than that used so that a much closer examination 
could have been made of the point at which subset size in- 
creases lack value. It is probable that the decline in 
validity in cross-validation at the ten-item level experi- 
enced in this study was a result of having less than two 
subjects available for each pattern of response. In fact, 
at the ten- item level over I 3 percent of the cross- 
validation sample was unusable because of the lack of a 
scored pattern. The problem of availability and a desire on 
the author's part to avoid the possible problems associated 
with taking subjects who received final school grades based 
on differing grading systems caused the restiction on the 
size of the samples used, 

A key area for more investigation is that of selecting 
predictors based on biographical information. Preliminary 
studies by others who have used PAIN (concurrent MN 4112 
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students) indicate that this technique of selection may 
have great value as a method of analyzing biographical data 
in relation to various criteria. This would seem logical if 
one will agree that patterns of information play a more im- 
portant role in the area of biographical information than 
in the area of testing, APPENDIX C gives an explanation of 
how some biographical information can be converted into the 
binary form necessary for PAIN, APPENDIX C also contains 
examples of some of the preliminary results obtained by us- 
ing PAIN in conjunction with biographical information. 
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TABLE 1 



CORRELATION COEFFICIENTS FOR SUBSETS OF THE ETST 

IN 

VALIDATION AND CROSS-VALIDATION 



NUIffiER OF ITEI/IS 
IN SUBSET 


VALIDATION 


CROSS-VALIDATION 


1 


.51^56 


.50002 


2 


.62723 


.60086 


3 


.69781 


.66228 




.7^034 


.70940 


5 


.77573 


.72420 


6 


,80276 


.73857 


7 


.82843 


.77829- 


8 


.85161 


.78925 


9 


.87731 


.81201 


10 


.90346 


.80213 
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TABLE 2 



PAIN PROGRAM RUNNING TIMES AND CORE STORAGE REQUIREMENTS 

FOR 

VARIOUS SIZE SUBSETS 



NUMBER OF ITEIvlS 
IN SUBSET 


APPROXBIATE 
RUNNING TIME 


APPROXIMATE 
CORE STORAGE REQUIRED 


7 


6 Min, 


270 K 


9 


9 Min, 


310 K 


10 


13 Min, 


360 K 



Note j Figures presented are based on evaluating a 70 -item 
test from a sample of 1,500 subjects, -Running times are for 
IBM 360 computer. 
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TABLE 3 



PAIN RESULTS WITH CONSTRAINTS PLACED 

ON THE 

FIRST TWO ITEMS OF A SUBSET 



TWO ITEMS 
CONSTRAINED 


ADDITIONAL ITEMS 
SELECTED 


SUBSET VALIDITY 


17. 3 


40.14.66.7 


.79603 


6. 16 


40. 56. 14. 21* 


.79038 


19. 6 


40.14.56.8* 


.79074 


1. 20 


40.14.56.7 


.79789 


4. 2 


20.40.56.7 


.78513 . 


39. 24 


20.14. 56.16* 


.78450 


37. 21 , 


16*. 14, 40. 56 


. .79393 


36. 30 


20.14.40,7 


.77809 


. 32. 29 . 


14.40.7.56 


.77285 


34. 23 


20. 14. 40,16* 


..77978 . 


55. 42 


16*.14.40.7 


.78773. . 


• 

ON 


14.40.7.56 


.76546 


47. 63 


14.40.7.56 


.78603 


00 

• 

VO 


16*. 14. 40. 7 


.79528 


69. 53 


16*. 40. 14.20 


t7.Z083 



Notei Additional items selected are presented in the order 
of selection by PAIN. The seven items originally selected 
by PAIN were 20, 1^,40,56, 7 »5i and 33 in that order, 
♦Indicates item not originally selected by PAIN 
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TABLE 4 



VALIDATION AND CROSS-VALIDATION RESULTS 

OF 

PATTERN ANALYSIS TECHNIQUE 
USED ON 

RANDOMLY SELECTED SUBSETS OP SEVEN ITEIVIS 



RANDOMLY SELECTED 
ITEMS 


VALIDATION 


CROSS-VALIDATION 


12 , 14 , 38 , 41 , 42 , 48,56 


.75235 


.67070 


4,40,58,63,66,67,70 


,66504 


.59745 


15 , 23 , 40 , 48 , 54 , 58,59 


.67265 


.57414 


4,10,17,26,34,45,55 


.71356 


.69899 


6 , 14 , 41 , 42 , 53 , 56,66 


.73571 


. 66944 



Note i PAIN validation and cross-validation results for 
seven items were ,82843 and .77829 respectively. 
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APPENDIX A 



PAIN AND CROSS-VALIDATION PROGRAT4 CONVERSION 
FOR GENERAL USE 

The PAIN and cross-validation programs in this study 
can be used to process any data of a binary nature simply 
by altering the contents of the DIIi/IENSION and DATA state- 
ments and insuring that the READ statement conforms to the 
device from which the data is being read. This APPENDIX is 
a detailed check list of how the DATA and DIMENSION state- 
ments should be set up by the user, 

A, PAIN DATA STATEMENT 

1, N1 is set equal to the size of the sample being used 
in validation, 

2, N2 is set equal to one (1), The program will handle 
increasing this variable to conform to the size of the sub- 
set under consideration, . 

3, N3 is set at a value equal to or greater than the in- 
teger range of the criterion scores. If the criterion 
scores are not in an integer form, conversion most be made 
before the data are read into the program so that the ma- 
trix involved can be addressed, A Data Conversion program 
can be altered to do this if such a program is used. 
Example! A 3*^5 criterion score can be converted to a 3^5 
criterion score. If conversion were not made in advance, 
the program would truncate this criterion score to 3» See 
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A7 in this Appendix for further details. The alternative to 
using integer criterion values would require a degree of 
manipulation within PAIN that is unwarranted for most 
cases. 

4. N4 is set equal to two (2). The program will handle 
the increasing of this variable to conform to the number of 
patterns within a given subset size, 

5. N5 is set equal to the final number of items desired 
in the subset, 

6. n 6 is set equal to the total number of items in the 
set under investigation, 

7. INDEX is equal to a value one less than the lower 
number used in determining the range of the criterion (N3), 
Example j If the criterion were student grades on a 4,0 
grading scale and the investigator did not know the actual 
value of the lowest grade in the sample , but knew the low- 
est grade was at least higher than 1.5i the value of N3 
could be set as low as 40-15 i or 25# and the value of INDEX 
would be 15“1» or 14. Note that if the lov/est value for a 
final grade had actually been 1,5 instead of higher than 
1,5 the conversion would have been 40-l4, or 26, for the 
value of N3 and 14-1, or 13i for the value of INDEX. Al- 
though PAIN can be run without going through the process of 
assigning a value to INDEX, in many cases the core storage 
and running time of the program can be greatly reduced by 
using the INDEX variable. 
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B. CROSS-VALIDATION DATA STATEI/IENT 

1, In order to perform cross-validation both the vali- 
dation and cross-validation data must be read into the 
program respectively, 

2, Nl, N3, and INDEX follow the same rules as in the 
PAIN DATA statement, 

3, N2 is set equal to the number of items in the subset 
being cross-validated or for which mean pattern scores are 
desired, 

4, N4 is set equal to the number of patterns associated 
with the subset size being cross-validated. This value is 
equal to 2 to the N2 power, 

5, N? is set equal to the number of subjects in the 
cross-validation sample. Setting this value to zero (0) 
results in processing only the mean pattern scores for the 
validation sample, 

C, DIMENSION STATEMENTS 

1 , The variables in the DIMENSION statements are dimen- 
sioned according to the comments at the beginning of PAIN 
and the cross-validation programs, 

2, Definitions of the variables involved in the 
DIMENSION statements are contained in the list of non-dummy 
variables preceding the computer programs in the COMPUTER 
PROGRAMS section(p, 31). 
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APPENDIX B 



INTERPRETING PATTERN CODES 

All patterns of response were converted from binary 
to decimal form during PAIN and the cross-validation pro- 
grams so that matrix addresses could be used. Hence, the 
list of patterns printed as output to the cross-validation 
program is in decimal form. The user of this program sim- 
ply needs to subtract one from the pattern number in the 
output to obtain the actual decimal equivalent of the bi- 
nary niimber of the pattern referenced. For example, the 
cross-validation program assigned a mean pattern score of 
73*0 to the pattern listed as number 58 . This would convert 
to the decimal equivalent 57 t which yields the binary pat- 
tern 0111001, Since the items of the subset used were read 
into the cross-validation program in the order 5,7,14,20, 
33 »^ 0 i 56 » the pattern would indicate that a "correct" or 
"yes" answer to items 7*1^»20, and $6 coupled with an "in- 
correct" or "no" answer to items number 5»33» and 40 
predict a criterion score of 73.0. 
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APPENDIX C 



BIOGRAPHICAL INFORMATION AND PAIN 



To convert biographical information into the binary 
form necessary for use in PAIN requires questions to be 
formulated such that answers can be expressed in a yes-no 
form. 

Questions that at first do not appear to fit a yes-no 
format are already being sectioned into parts so that that 
format can be used. For example, the question, "How old are 
you?", can be handled in the following way and often isi 
How old are you? Check one box 

1, under 21 □ 



The boxes without checks can be considered as "no" answers 
in this example with the checked box a "yes". The one ques- 
tion, "How old are you?", can now be handled by PAIN as 
four separate items. If PAIN should indicate that one or 
more of these items represent good predictors, those items 
could be further sectioned for further evaluation. Other 
types of biographical information can also be handled in 
this manner. 

The author knows of at least two studies, that were be- 
ing conducted by students at the U, S. Naval Postgraduate 



2, 21 - 30 

3. 31 - ^0 

4, over 40 



[Z1 

□ 

□ 
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School at the time of this writing, which involved the use 
of PAIN for selecting items of a biographical nature for 
use in predicting various criteria. A study using the final 
QPA's of students in the Masters program at the U. S. Naval 
Postgraduate School as the criterion has yielded encourag- 
ing preliminary results. While samples were too small to 
justify comparison in cross-validation, PAIN provided uni- 
formly higher validities than stepwise regression in 
validation. 

The second study involved predicting drug addiction. 
This study had found a four-item subset that had a validity 
of over .60 in validation. No cross-validation results were 
available at the time of this writing, but any reasonable 
retention of validity in cross-validation could provide an 
extremely useful predictor in this field. 



30 



COMPUTER PROGRAMS 



LIST 

ANS(I) 

B(J) 

C(J) 

Cl 

C2 

D(J) 

F(M,N) 

INDEX 

K 

M 

N 

N1 

N2 

N3 

N5 

n6 

N7 

P(I.J) 



OF NON-DUMI\IY VARIABLES USED IN COI4PUTER PROGRAMiS 

correct response to item I of test 

jth subject's assigned decimal value for his 
binary pattern of responxes 

jth subject's final school grade used for the 
criterion 

sum of the criterion scores 

sum of the squares of the criterion scores 

■i th subject's identification number 

the joint frequency distribution of patterns 
versus criterion scores 

a value equal to the lowest score used in 
determining range (N3) minus 1 

as used in the conversion program only, the number 
of the data card on which information is stored 

row in the "F" matrix representing number of a 
pattern in a given subset 

column in the "F" matrix representing the 
criterion score 

the size of the sample used in validation 
number of items in the subset being considered 
coded range of the criterion scores 
number of patterns in the subset being considered 
size of subset desired 

total number of items in set being investigated 
the size of the sample used in cross-validation 
jth subject's binary response to item i 
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I 



I 



R2 

S(I) 

51 

52 

W(I) 

X(J) 

XI 

X2 



- the correlation coefficient determined from the 
use of raw scores 

- the mean pattern score for pattern i 

- sum of the criterion scores for a given pattern 

- sum of the subjects with a given pattern 

- answer given by subject to item i of test 

- jth subject's mean pattern score 

- sum of mean pattern scores 

- sum of squares of mean pattern scores 
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DATA CONVERSION PROGRAM 



C THIS PROGRAM EDI T S • I NF CRM AT I ON ABOUT SUBJECTS VvHO HAVE 
C TAKEN THE ETST, CONVERTS TEST DATA TO A BINARY FCRM SUCH 
C THAT A CORRECT ANSWER IS ASSIGNED THE VALUE 'I' AND AN 
C IN’CORRECT ANSWER IS ASSIGNED THE VALLE 'O'. THIS 
C INFORMATION IS THEN TRANSFERED TO A DATA CELL 
C 

IMPLICIT INTEGER*4( A-Z) 

DIMENSION P(70) , ANS(70) ,W(70) 

DATA NREAD,NWRITE,NPUNCH/8 ,9,7/ 

THE CORRECT ANSWERS TO ALL ETST QUESTIONS ARE READ IN 

READ (5,1) (ANSI I ) , 1= 1, 70) 

1 FORMAT (7011) 

INFORMATION ON EACH SUBJECT IS READ IN 

DO 100 J=I,2400 
IF(J.EQ.2398) GO TO 50 
10 READ (NREAD,2 , END=50) D , K , ( W ( I ) , I = 1 ,70 ) 

2 FORMAT (T2,I6,I 1,7011) 

IF(K.NE.5) GO TO 10 
READ (NREAD,4) K,Q 

4 FORMAT (T8, I 1 ,T64, 12) 

I F (K.N-E.6 ) GO TO 10 

CONVERT EACH SUBJECTS ETST ANSWERS TO BINARY FCRM 
DO 20 1^1,70 

IF(W( I) .NE.ANSI I ) ) P( I )=0 
I F(W( I) .EQ.ANSI I ) ) F(I)=1 
20 CONTINUE 

OUTPUT THE EDITED AND CONVERTED INFORMATION 

IF(J.GT.44) GO TO 83 
GO TO 84 

82 IF(J.GT.150) GO TO 84 

WRITE (6,85) J,C, (P(I ) ,1 = 1,70) 

85 FORMAT (I5,5X,I3,5X,7CI1) 

84 WRITE (NWRITE,8) D , C , ( P ( I ) , I = 1 , 7 C ) 

8 FORMAT (16,12,7011) 

IOC CONTINUE 

50 WRITE (6,9) D, C, ( P ( I ) , 1= 1 , JO ) 

9 FORMAT (I8,5X,.I2 ,5X,7CI1) 

STOP 
END 
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PAINt ITEM SELECTION PR0GRAI4 



c 

c 

c 

c 



THIS PROGRAM SELECTS A SLBSET OF ITEMS THAT MAXIMIZE 
VALIDITY UNDER PATTERN ANALYSIS. THE VALUES ASSIGNED TG 
C THE DIMENSIONED VARIABLES ARE: E(M), C(M), D(N1), 

F(2=t=’^N5,N3) , P(N6,N1) ,S(2-«N5) ,X(NL) , ITEM(N5) 

IMEGEP*2 E,C,F,P 
INTEGER C1,C2,R3,S1,S2 

DIME NS I CN e(l500),C(I50C),D(1500),F(128,47), 
2P(7C,15C0) ,S( 128) ,X(1500) ,ITEM.(7) 

DATA N1 ,N2,N5.N4fN5tN6, INDEX/ 15 00 , 1 , 47 , 2 , 7 , 70 , 29/ 

THE SUBSET THAT WILL CONTAIN THE TEST ITEMS SELECTED IS 
INITIALIZED TO ZERO 

DC 11 1=1, N5 
ITEM( I)=0 
11 CONTINUE 



DATA IS READ INTO THE PROGRAM 



C1=0 
C2 = 0 

DC 13 J=1,N1 

350 READ (9,9) D ( J ) , C ( J ) , ( P ( I , J ) , I = 1 , N6 ) 

9 FORMAT (16,12,7011) 

THE FOLLOWING TWO IF STATEMENTS PREVENT THE CCNS IDERAT I CN 
OF ANY SUBJECT WHO HAS A CRITERION SCORE OUTSIDE THE 
RANGE LIMITS USED IN ESTABLISHING N3 AND INDEX 

IF(C(J ) .LT.30) GO TO 350 
IF(C( J) .GT.76) GO TC 350 
C1=C(J)+C1 
C2=C(J)*C(J)+C2 
13 CONTINUE 

THIS LCCP CONTROLS WHICH ITEM OF THE SUBSET IS BEING 
SELECTED DURING THE CURRENT ROUND OF EXAMINATIONS 

DC 220 L=1 ,N5 
RH=0.0 

THIS LCCP CONTROLS WHICH ITEM FROM THE TOTAL SET OF ITEMS 
IS BEING CONSIDERED FCR EXAMINATION 

DC 200 KA=1,N6 

THIS LOOP PREVENTS CONSIDERATION OF AN ITEM ALREADY 
SELECTED TO BE A MEMBER OF THE SUBSET 

DC 14 1 = 1, N2 

IF(KA.EC.ITEM(I) ) GO TO 200 
14 CONTINUE 

THE F MATRIX IS INITIALIZED TO ZERO 

DO 12 1=1, N4 
DC 17 J=1,N3 
F(I ,J)=0 
17 CONTINUE 
12 CCNTINUE 

the joint frequency DISTRIBUTION CF PATTERN AND CPITERICN 
SCORES IS D-ETERMINED AND BINARY PATTERNS ARE CONVERTED TO 
DECIMAL EQUIVALENTS TO BE USED AS ROW ADDRESSES. 
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DO 18 J=1,N1 
f^= I 
K = N4 

DO 19 1=1, N2 
K = K/2 

I F( ITEM ( I ) . EQ.O) GC TC 19 
M=K*P( I TEM( I ) ,J )+M 

19 CCMINUE 
M=K*P(KA, JJ+M 
N = C(J )- INDEX 
F(M,N)=F(M,N)+1 
B( J)=M 

18 CONTINUE 

THE MEAN CRITERION SCOPES FOR EACH PATTERN ARE COMPUTED 

DC 20 1 = 1, N4 

S1=0 

S2 = 0 

DC 21 J=1,N3 
S2=F( I, Ji+S2 
S l=U + INDEX)-F( I , J) + S1 
21 CONTINUE 

IF(S2.EG.O) GO TO 10 
S( I J=Sl/S2 
GC TO 20 
10 SU) = 0.C 

20 CONTINUE 

MEAN SCORES FOR PATTERNS ARE ASSIGNED TO EACH SL6JECT 
ACCORDING TC HIS PATTERN 

DO 31 J=1,N1 
K=B( J I 
X( J)=S(KI 
31 CONTINUE 

CGPRELATICN COEFFICIENT BETWEEN CRITERION AND PATTERN 
FORMED USING ITEM UNDER CONSIDERATI CN IS COMPUTED. 

X1=0.0 
X2=0.0 
W = 0 .0 

DC 41 J=1 ,N1 
X1=X( J J+Xl 
X2=X(J)*X( J)+X2 
W=C(J J=i=X( J ) + W 
41 CONTINUE 

R1 = (N1'^X2)~( X1 = X1 ) 

IFIRl.EQ.O.OJ GO TO 85 
R3=(Ni’i'C2)-(Ci-Cl ) 

. R5=(N1-W)-(C1*X1 ) 

C=(R1*R3)*^0.5 
P2=R5/C 
GO TO 86 

85 R2=0.0 
C 

C IT IS DETERMINED IF THE CORRELATICN COEFFICIENT USING 
C ITEM PRESENTLY UNDER CONSIDERATION IS HIGHER THAN HIGHEST 
C PREVIOUSLY FOUND CORRELATION COEFFICIENT WITH SAME SIZE 
C SUBSET. ITEM NUMBER AND CORRELATICN COEFFICIENT ARE 
C STORED IF THEY ARE HIGHER. 

C 

86 WRITE (6,54) KA,R2 

54 FORMAT (• THE CORRELATION OF ITEM ',14,' WITH THE 
2PREVI0USLY SELECTED ITEMS IS',F10.9) 

IF(R2.GT.RH) GO TO 81 
GC TO 200 
81 RH=R2 
ITEMH=KA 
200 CONTINUE 
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C PRINT CUT THE ITEM NUMBERS CF ITEMS USED TO GET THE 
C HIGHEST CORRELATION COEFFICIENT FOR THE SIZE SLESET UNDER 
C CCNS IDEKATICN AND THE VALUE OF THIS COEFFICIENT. 

C 

ITEM(L)=ITEMH 

VsRITE (6,160) L, L, ( ITEMd ) , 1 = 1, N5) ,RH 
16C FCRMAT CC'.'THE BEST ',12, 'ITEMS TC USE FCR A ',12, 
2' ITEM SUBSET ARE:',/' ',1015,' AND THEY YIELD A 

3C0RRELATICN COEFFICIENT OF ',F10.9,//) 

C 

C ADVANCE THE NUMBER OF ITEMS IN THE SUBSET AND THE NUMBER 
C CF PATTERNS POSSIBLE THEN REPEAT THE EXAMINATION PROCESS. 
N 2=N2+ 1 
N4=2--N2 
22C CONTINUE 
STOP 
END 
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CROSS-VALIDATION PROGRAIVl 



// EXEC FORTCLG 

C This PROGRAM PRINTS THE MEAN PATTERN SCORES FCR THE 
C SLBJECTS IN THE VALIDATION SAMPLE ANC CROSS-V AL IDATES TFE 
C CRCSS-VALIDATION SAMPLE. THE VALUES ASSIGNED TC THE 
C DIMENSICNEO VARIABLES ARE:B(N1), C ( N L ) , F ( N4 , N2 ) , P(N2tNl), 
C S(N4) , X(N1 ) 

INTEGER*2 B,C,D,F,P 
lKTEGER=i=4 C L , C2 , R3 , S 1 , S2 

DATA N1 ,N2tN3,N4,N7f I NDEX/1500,7,47,128 ,75C,29/ 

DATA IS READ IN FROM TFE INPUT DEVICE 

DC 220 L=l,2 
IF(L.EG.2) M = N7 
IF(M.EC.O) GO TO 900 
DC 13 J=L,Nl 

350 READ (9,9) C ( J) , ( P ( I , J ) , I =1 ,N2 ) 

9 FORMAT (T7, I 2,T13, I 1 ,T15, I 1 ,T22 , I I ,T28, I I ,T30,I 1 , 

2T48, I 1, T64, ID 

THE FOLLOWING TWO IF STATEMENTS PREVENT THE CONSIDERATION 
OF ANY SUBJECT WHO HAS A CRITERION SCORE OUTSIDE THE 
RANGE LIMITS USED IN ESTABLISHING N3 AND INDEX 

IF(C(J) .LT.30) GO TC 350 
IF(C( J) .GT.76) GO TC 350 
13 CONTINUE 
RF=0.0 

THE F MATRIX IS INITIALIZED TO ZERO 

DC 12 1 = 1, N4 
DC 17 J=1,N3 
F(I ,J) =0 

17 CONTINUE 
12 CONTINUE 

THE JOINT FREQUENCY DISTRIBUTION OF PATTERNS ANC 
CRITERION SCORES IS DETERMINED AND BINARY PATTERNS ARE 
CONVERTED TO DECIMAL EQUIVALENTS TO EE USED AS ROW 
ADDRESSES 

DC 18 J=1,N1 
29 M=1 

K = N4 

DC 19 I=1,N2 
K = K/2 

M=K^P( I , J )+M 
19 CONTINUE 

N=C(J I-INDEX 
F (M,N) = F(M,N)+1 
B ( J) = M 

18 CCNTINUE 
IF(L.EQ.2) GO TO 200 

THE MEAN CRITERION SCORES FOR EACH PATTERN ARE COMPUTED 
ANC PRINTED OUT. 

WRITE (6,94) 

94 FORMAT (• ', 5X8 ' PATTERN NUMBER ', lOX ,' MEAN CPITERICN 

2SCCRE' , //) 

DO 20 1 = 1, N4 
S 1=0 
S2 = 0 

DC 21 J=1,N3 
S2=F( I, J)+S2 
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S1=(J+INDEX)*F( I , J)+S1 
21 CONTINUE 

IF(S2 ,LT . I .0) GO TO 10 

S ( I» = S1/S2 

WRITE (6,93) I,S(I) 

93 FORMAT (• • , I IX , I 3 , 20X , F 12 . 5 ) 

GC TO 20 
1C S( I } = 0. C 

WRITE (6,90) I 

9C FORMAT (' NO SUBJECT HAC THE FOLLOWING PATTERN DURING 
2'ITEM SELECTION PROGRAM : • , 1 4 ) 

20 CONTINUE 
2CC CONTINUE 
C 

C MEAN SCORES FOR PATTERNS ARE ASSIGNED TO EACH SUBJECT. 

C 

NR=0 

CO 31 J=1,N1 
K=B(J) 

X( J)=S(K) 

IF(S(K) .LT.0.9) GO TO 91 
GO TO 31 

91 NR=NR+1 
C(J)=0 

31 CONTINUE 

IF(L.EQ.l) GO TO 32 
WRITE (6,92) NR 

92 FORMAT (16,* SUBJECTS HAVE PATTERN SCORES NOT 
2ENCCUNTERED DURING THE PAIN PROGRAM') 

C 

C THE CCRRELATICN COEFFICIENT IN VALIDATION AND 
C CROSS-VALIDATION FOR THE ITEMS UNDER CONSIDERATION IS 
C COMPUTED AND PRINTED CUT. 

C 

32 C1=0 
C2 = 0 
Xl=C.O 
X2=0.0 
W = 0.0 

DC 41 J=1,N1 
C1=C(J )+Cl 
C2=C( J) =!'-C( J)+C2 
X1=X( J ) +X1 
X2=X(J)*X(J)+X2 
W=C(J)^X( J)+W 
41 CONTINUE ^ 

M = N1-NR 

Rl=(Nl=!=X2)-( X1*X1 ) 

R3=(N1=:=C2)-(C1=!=C1) 

R5=(N1^W)-(C1=^=X1 ) 

C = (R1^R3)=!‘*0.5 
R2=R5/Q 
81 RH=R2 

IF(L.Ew.2) GO TO 220 

160 FORMAT ('O', 'THE ITEMS UNDER CONSIDERATION VIELD A 
2VALIDITY OF *,F10.5,' IN VALIDATICN') 

220 CONTINUE 

WRITE (6,54) N2,RH 

54 FORMAT (• THE VALIDITY OF THE 12 ITEMS CONSIDERED 
2FCR CROSS-VALIDATION IS *,F10.5 
900 STOP 
END 
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